Global speech user interface

ABSTRACT

A global speech user interface (GSUI) comprises an input system to receive a user&#39;s spoken command, a feedback system along with a set of feedback overlays to give the user information on the progress of his spoken requests, a set of visual cues on the television screen to help the user understand what he can say, a help system, and a model for navigation among applications. The interface is extensible to make it easy to add new applications.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to U.S. Provisional PatentApplication No. 60/327,207, filed Oct. 3, 2001 (Attorney Docket No.AGLE0050PR).

FIELD OF THE INVENTION

[0002] This invention relates generally to interactive communicationstechnology, and more particularly to a speech-activated user interfaceused in a communications system for cable television or other services.

BACKGROUND OF THE INVENTION

[0003] Speech recognition systems have been in development for more thana quarter of century, resulting in a variety of hardware and softwaretools for personal computers. Products and services employing speechrecognition are rapidly being developed and are continuously applied tonew markets.

[0004] With the sophistication of speech recognition technologies,networking technologies, and telecommunication technologies, amultifunctional speech-activated communications system, whichincorporates TV program service, video on demand (VOD) service, andInternet service and so on, becomes possible. This trend of integration,however, creates new technical challenges, one of which is the provisionof a speech-activated user interface for managing the access todifferent services. For example, a simple and easy to usespeech-activated user interface is essential to implement a cableservice system that is more user-friendly and more interactive.

[0005] In a video on demand (VOD) system, cable subscribers pay a feefor each program that they want to watch, and they may have access tothe video for several days. While they have such access, they can startthe video any time, watch it as many times as they like, and useVCR-like controls to fast forward and rewind. One of the problems withbutton-enabled video on demand systems is that navigation is awkward.Cable subscribers frequently need to press the page up/down buttonsrepeatedly until they find the movie they want. It is impractical inspeech enabled systems because there are limits to the number of itemsthat the speech recognition system can handle at once. What is desiredis a powerful interface that gives users more navigation options withoutdegrading recognition accuracy. For example, the interface might enablethe users, when viewing a movie list, to say a movie name within thatlist and be linked to the movie information screen.

[0006] The interactive program guide (IPG) is the application that cablesubscribers use to find out what's on television. One of the problemswith button-enabled program guides is that navigation is awkward. Cablesubscribers frequently need to press the page up/down buttons repeatedlyuntil they find the program they want. What is further desired is astreamlined interface where many common functions can be performed withfewer voice commands. For example, the interface allows the use ofspoken commands to control all IPG functionality.

[0007] Another problem is that the user must switch to the program guideto find out what's on and then switch back to watch the program. Thereare some shortcuts, but finding programs and then switching to themstill requires many button presses. What is further desired is anapplication that allows cable subscribers to get one-step access toprograms they want to watch without ever switching away from the currentscreen.

[0008] Another important issue in the design of a speech-activated userinterface is responsiveness. To interact with the communications systemeffectively, the user is required to give acceptable commands, and thecommunications system is required to provide instant feedback. A regularuser, however, may not be able to remember the spoken commands used inthe speech interface system. What is further desired is an efficientmechanism to provide immediate and consistent visual feedback messagesconsisting of frequently used commands, speakable text, and access tothe main menu, as well as offering escalating levels of help in theevent of unsuccessful speech recognition.

SUMMARY OF THE INVENTION

[0009] This invention provides a global speech user interface (GSUI)which supports the use of speech as a mechanism of controlling digitalTV and other content. The functionality and visual design of the GSUI isconsistent across all speech-activated applications and services. Thevisual design may include the use of an agent as an assistant tointroduce concepts and guide the user through the functionality of thesystem. Specific content in the GSUI may be context-sensitive andcustomized to the particular application or service.

[0010] The presently preferred embodiment of the GSUI consists of thefollowing elements: (1) an input system, which includes a microphoneincorporated in a standard remote control with a push-to-talk button,for receiving the user's spoken command (i.e. speech command); (2) aspeech recognition system for transcribing a spoken command into one ormore commands acceptable by the communications system; (3) a navigationsystem for navigating among applications run on said communicationssystem; and (4) a set of overlays on the screen to help the usersunderstand the system and to provide user feedback in response toinputs; and (5) a user center application providing additional help,training and tutorials, settings, preferences, and speaker training.

[0011] The overlays are classified into four categories: (1) a set ofimmediate speech feedback overlays; (2) a help overlay or overlays thatprovide a context-sensitive list of frequently used speech-activatedcommands for each screen of every speech-activated application; (3) aset of feedback overlays that provides information about a problem thatsaid communications system is experiencing; and (4) a main menu overlaythat shows a list of services available to the user, each of saidservices being accessible by spoken command.

[0012] An immediate speech feedback overlay is a small tab, whichprovides simple, non-textual, and quickly understood feedback to theuser about the basic operation of the GSUI. It shows the user when thecommunications system is listening to or processing an utterance,whether or not the application is speech enabled, and whether or not theutterance has been understood.

[0013] The last three categories of overlays are dialog boxes, each ofwhich may contain a tab indicating a specific state of the speechrecognition system, one or more text boxes to convey serviceinformation, and one or more virtual buttons that can be selected eitherby spoken command or pressing the actual corresponding buttons of theremote control device.

[0014] The help overlay provides a list of context-sensitive spokencommands for the current speech-activated application and is accessibleat all times. It also provides brief instructions about what onscreentext is speakable and links to more help in the user center and the mainmenu. Here, the term “speakable” is synonymous with “speech-activated”and “speech-enabled.”

[0015] Feedback overlays include recognition feedback overlays andapplication feedback overlays. Recognition feedback overlays inform theuser that there has been a problem with recognition. The type offeedback that is given to the user includes generic “I don't understand”messages, lists of possible recognition matches, and more detailed helpfor improving recognition. Application feedback overlays inform the userabout errors or problems with the application that are not related tounsuccessful recognition.

[0016] The main menu overlay provides the list of digital cable servicesthat are available to the user. The main menu overlay is meant to befaster and less intrusive than switching to the multiple systemoperator's full-screen list of services.

[0017] One deployment of the GSUI is for the Interactive Program Guide(IPG), which is the application that the cable subscribers use to findout what's on television. The GSUI provides a streamlined interfacewhere many common functions can be performed more easily by voice. TheGSUI for the IPG allows the use of spoken commands to control all IPGfunctionality. This includes: (1) selecting on-screen “buttons”; (2)directly accessing any program or channel in the current time slot; and(3) performing every function that can be executed with remote controlkey presses.

[0018] Another deployment of the GSUI is for the Video on Demand (VOD),which functions as an electronic version of a video store. The GSUIprovides a streamlined interface where many common functions can beperformed more easily by voice. The GSUI for the VOD allows the use ofspoken commands to control all VOD functionality. This includes: (1)selecting on-screen “buttons”; (2) directly accessing any movie title ina particular list; and (3) performing every function that can beexecuted with remote control key presses.

[0019] Another deployment of the GSUI is for a user center, which is anapplication that provides: (1) training and tutorials on how to use thesystem; (2) more help with specific speech-activated applications; (3)user account management; and (4) user settings and preferences for thesystem.

[0020] Another aspect of the invention is the incorporation of a SpeakerID function in the GSUI. Speaker ID is a technology that allows thespeech recognition system to identify a particular user from his spokenutterances. For the system to identify the user, the user must brieflytrain the system, with perhaps 45 seconds of speech. When the system isfully trained, it can identify that particular speaker out of many otherspeakers. In the present embodiment, Speaker ID improves recognitionaccuracy. In other embodiments, Speaker ID allows the cable service toshow a custom interface and personalized television content for aparticular trained speaker. Speaker ID can also allow simple andimmediate parental control. Thus, e.g. an utterance itself, rather thana PIN, can be used to verify access to blocked content.

[0021] The advantages of the GSUI disclosed herein are numerous, forexample: first, it provides feedback about the operation of the speechinput and recognition systems; second, it shows the frequently usedcommands on screen and a user does not need to memorize the commands;third, it provides consistent visual reference to speech-activated text;and fourth, it provides help information in a manner that isunobstructive to screen viewing.

BRIEF DESCRIPTION OF THE DRAWINGS

[0022]FIG. 1 is block diagram illustrating an exemplary communicationssystem providing digital cable services according to the invention;

[0023]FIG. 2A shows six basic tabs used to indicate immediate feedbackinformation;

[0024]FIGS. 2B, 2C, 2D, and 2E are flow diagrams illustrating anexemplary process by which the communications system displays immediatefeedback overlays on the screen;

[0025]FIG. 3A is a sequence diagram showing the timeline of a normalspoken command;

[0026]FIG. 3B is a sequence diagram showing the time line when thespoken command is interrupted by a button input (case 1);

[0027]FIG. 3C is a sequence diagram showing the time line when thespoken command is interrupted by a button input (case 2);

[0028]FIG. 3D is a sequence diagram showing the time line when thespoken command is interrupted by a button input (case 3);

[0029]FIG. 3E is a sequence diagram showing the time line in a casewhere execution of a spoken command is interrupted by a new speechinput;

[0030]FIG. 4 is a flow diagram illustrating a process by which the helpoverlay appears and disappears;

[0031]FIG. 5 is a flow diagram illustrating a process by which the mainmenu overlay appears and disappears;

[0032]FIG. 6A is a graphic diagram illustrating an exemplary helpoverlay dialog box used in the TV screen user interface; and

[0033]FIG. 6B is a screen capture showing the appearance of the helpoverlay dialog box illustrated in FIG. 6A.

DETAILED DESCRIPTION

[0034] A Communications System Providing Digital Cable Service

[0035] Illustrated in FIG. 1 is an exemplary communications system 100for facilitating an interactive digital cable service into which aglobal speech user interface (GSUI) is embedded. The user interacts withthe communications system by giving spoken commands via a remote controldevice 110, which combines universal remote control functionality with amicrophone and a push-to-talk button acting as a switch. The remotecontrol device in the presently preferred embodiment of the invention isfully compatible with the Motorola DCT-2000 (all of the standardDCT-2000 remote buttons are present). The spoken commands aretransmitted from the remote control device 110 to the receiver 120 whenthe cable subscriber presses the push-to-talk button and speaks into themicrophone. The receiver 120 receives and sends the received speechinput to a set-top-box (STB) 130.

[0036] The STB 130 forwards the speech input to the head-end 150, whichis the central control center for a cable TV system. The head-end 150includes a speech engine 160, which comprises a speech recognizer 170,and an application wrapper 180. The speech recognizer 170 attempts totranscribe the received speech input into textual informationrepresented by binary streams. The output of the speech recognizer 170is processed by the application wrapper 180, which dynamically generatesa set of navigation grammars and a vocabulary, and attempts to determinewhether a speech input has been recognized or not. Here, a navigationgrammar means a structured collection of words and phrases boundtogether by rules that define the set of all utterances that can berecognized by the speech engine at a given point in time.

[0037] When the speech input is recognized, the application wrapper 180transforms the speech input into commands acceptable by the applicationserver 190, which then carries out the user's requests. The applicationserver 190 may or may not reside on the speech engine 160. During theprocess, the communications system 100 returns a set of feedbackinformation to the TV screen via STB 130. The feedback information isorganized into an overlay on the screen.

[0038] Television Screen Interface—Functionality and Flows

[0039] The television screen interface elements of the Global SpeechUser Interface (GSUI) include (1) immediate speech feedback overlays;(2) instructive speech feedback overlays; (3) help overlays; (4) mainmenu overlays; and (5) speakable text indicators.

[0040] Immediate Speech Feedback

[0041] Immediate speech feedback provides real-time, simple, graphic,and quickly understood feedback to the cable subscriber about the basicoperation of the GSUI. This subtle, non-textual feedback gives necessaryinformation without being distracting. FIG. 2A illustrates variousexemplary tabs used to indicate such feedback information. In thepreferred embodiment, the immediate speech feedback displays thefollowing six basic states (Those skilled in the art will appreciatethat the invention comprehends other states or representations as well):

[0042] (1) The push-to-talk button pressed down—the system has detectedthat the button on the remote has been pressed and is listening to thecable subscriber. On the screen, a small tab 211 is displayed thatincludes, for example, a highlighted or solid identity indicator orbrand logo.

[0043] (2) The application or screen is not speech enabled. When theuser presses the push-to-talk button, a small tab 212 is displayed thatincludes a prohibition sign (_) overlaid on a non-highlighted brandlogo.

[0044] (3) The system is processing an utterance, i.e. covering theduration between the release of the push-to-talk button and theresulting action of the communications system. On the screen, a smalltab 213 is displayed that includes a transparency or semi transparency(40% transparency for example) flashing brand logo. The tab 213 isalternated with an empty tab to achieve the flashing effect.

[0045] (4) Application is alerted. On the screen, a small tab 214 isdisplayed that includes a yellow exclamation point overlaid on anon-highlighted brand logo. It may have different variants. For example,it may come with a short dialog message (variant 214A) or a long dialogmessage (variant 214B).

[0046] (5) Successful recognition has occurred and the system isexecuting an action. On the screen, a small tab 215 is displayed thatincludes a green check mark overlaid on a non-highlighted brand logo.

[0047] (6) Unsuccessful recognition has occurred. After the first try,the recognition feedback overlay is also displayed. On the screen, asmall tab 216 is displayed that includes a red question mark overlaid ona non-highlighted brand logo.

[0048] These states are shown in the following set of four flowcharts(FIG. 2B through FIG. 2E). Note that in the preferred embodiment, theconventional remote control buttons are disabled while the push-to-talkbutton is pressed, and that once the system has started processing aspoken command, the push-to-talk button is disabled until the cablesubscriber receives notification that the recognition was successful,unsuccessful, or stopped.

[0049]FIGS. 2B, 2C, 2D and 2E are flow diagrams illustrating anexemplary process 200 that the communications system displays immediatefeedback overlays on the screen. FIG. 2B illustrates the steps200(a)-200(g) of the process:

[0050]200(a): Checking if a current screen is speech-enabled when thepress-to-talk button is pressed.

[0051]200(b): If the current screen is speech-enabled, displaying afirst tab 211 signaling that a speech input system is activated. Thisfirst tab 211 includes a highlighted or solid brand logo.

[0052]200(c): If the current screen is not speech-enabled, displaying asecond tab 212 signaling a non-speech-enabled alert. This second tab 212includes a prohibition sign (_) overlaid on a non-highlighted brandlogo. It stays on screen for an interval about, for example, tenseconds.

[0053]200(d): If the push-to-talk button is repressed before or afterthe second tab 212 disappears, repeating 200(a).

[0054] Step 200(b) is followed by the steps 200(e), 200(f), and 200(g).

[0055]200(e): If the push-to-talk button is not released within a secondinterval (about 10 seconds, for example), interrupting recognition.

[0056]200(f): If the push-to-talk button is released after a thirdinterval (about 0.1 second, for example) lapsed but before the secondinterval in Step 200(e) lapsed, displaying a third tab 213 signalingthat speech recognition is in processing. This third tab includes atransparency or semi transparency flashing brand logo.

[0057]200(g): If the push-to-talk button was released before the thirdinterval lapsed, removing any tab on the screen.

[0058] Note that FIG. 2B includes a double press of the talk button. Theaction to be taken may be designed according to need. A double press hasoccurred when there is 400 ms or less between the “key-up” of a primarypress and the “key down” of a secondary press.

[0059]FIG. 2C illustrates the steps 200(f)-200(k) of the process. Notethat when there is no system congestion, there should rarely be a needfor the cable subscriber to press a remote control button while a spokencommand is being processed. When there is system congestion, however,the cable subscriber should be able to use the remote control buttons toimprove response time. An extensive discussion of when cable subscriberscan issue a second command while the first is still in progress and whathappens when they do so is given after the description of this process.

[0060] Steps 200(f) is followed by the steps 200(h) and 200(i):

[0061]200(h): If the Set Top Box 130 in FIG. 1 takes longer than afourth interval (five seconds, for example) measured from the time thatthe cable subscriber releases the push-to-talk button to the time thelast speech data is sent to the head-end 150, speech recognitionprocessing is interrupted and a fourth tab 214V (which is a variant ofthe tab 214), signaling an application alert. The fourth tab 214Vincludes a yellow exclamation point with a short dialog message such asa “processing too long” message. It stays on the screen for a fifthinterval (about 10 seconds, for example).

[0062]200(i): If a remote control button other than the push-to-talkbutton is pressed while a spoken command is being processed,interrupting speech recognition processing and removing any tab on thescreen.

[0063] Step 200(h) may be further followed by the steps 200(j) and200(k):

[0064]200(j): If the push-to-talk button is repressed while the fourthtab 214V is on the screen, removing the fourth tab and repeating 200(a).This step illustrates a specific situation where the recognitionprocessing takes too long. Note that it does not happen every time thefourth tab is on the screen.

[0065]200(k): When said fifth interval lapses or if a remote controlbutton other than the push-to-talk button is pressed while said fourthtab 214V is on the screen, removing said fourth tab from the screen.

[0066]FIG. 2D illustrates the steps 200(l)-200(u) upon a completerecognition of 200(f). Note that the system keeps track of the number ofunsuccessful recognitions in a row. This number is reset to zero after asuccessful recognition and when the cable subscriber presses any remotecontrol button. If this number is not reset, the cable subscribercontinues to see the long recognition feedback message any time there isan unsuccessful recognition. If cable subscribers are having difficultywith the system, the long message is good, even when several hours haveelapsed between unsuccessful recognitions. The recognition feedback onlystays on screen for perhaps one second, so it is not necessary to removeit when any of the remote control buttons is pressed. When thepush-to-talk button is repressed, the recognition feedback should bereplaced by the speech activation tab 211.

[0067]200(l): Checking whether speech recognition is successful.

[0068]200(m): If speech recognition is successful, displaying a fifthtab 215 signaling a positive speech recognition. The fifth tab includesa green check mark overlaid on a non-highlighted brand logo. It stays onthe screen for an interval about, for example, one second.

[0069]200(n): If the push-to-talk button is repressed before the fifthtab 215 disappears, repeating 200(a).

[0070]200(l) is followed by the steps 200(o), 200(q), and 200(r).

[0071]200(o): If the speech recognition is unsuccessful, checking thenumber of unsuccessful recognitions. The number is automatically trackedby the communications system and is reset to zero upon each successfulrecognition or when any button of the remote control device is pressed.

[0072]200(p): If the complete recognition is the first unsuccessfulrecognition, displaying a sixth tab 216 signaling a misrecognition ofspeech. This sixth tab 216 includes a red question mark overlaid on saidbrand logo. It stays on the screen for about, for example, one second.

[0073]200(q): If the push-to-talk button is repressed before the sixthtab disappears 216, repeating 200(a).

[0074] Step 200(o) is followed by the steps 200(r) and 200(s):

[0075]200(r): If the complete recognition is the second unsuccessfulrecognition, displaying a first variant 216A of the sixth tab signalinga misrecognition speech and displaying a short textual message. Thisfirst variant 216A of the sixth tab comprises a red question markoverlaid on said brand logo and a short dialog box displaying a shorttextual message. The first variant 216A stays on the screen for about,for example, ten seconds.

[0076]200(s): If the push-to-talk button is repressed before the firstvariant 216A of the sixth tab disappears, repeating 200(a).

[0077] Step 200(o) is followed by the steps 200(t) and 200(u):

[0078]200(t): If it is the third unsuccessful recognition, displaying asecond variant 216B of the sixth tab signaling a misrecognition speechand displaying a long textual message. The second variant of the sixthtab stays on the screen for an interval about, for example, ten seconds.

[0079]200(u): If the push-to-talk button is pressed before the secondvariant 216B of the sixth tab disappears, repeating 200(a).

[0080]FIG. 2E illustrates the steps 200(v)-200(x) following the Step200(e). Note that in the preferred embodiment, there are two differentmessages when the talk button is held down for a long interval. Thefirst message covers the relatively normal case where the cablesubscriber takes more than ten seconds to speak the command. The secondcovers the abnormal case where the push-to-talk button is stuck. Thereis no transition between the two messages. The second message stays onscreen until the button is released.

[0081]200(e): If the push-to-talk button is not released within a secondinterval (about ten seconds, for example), interrupting recognition.

[0082]200(v): Displaying a first variant 214A of the fourth tab. Thefirst variant 214A includes a yellow exclamation point and a firsttextual message. This tab stays on the screen for an interval of about,for example, ten seconds.

[0083]200(w): Removing the first variant 214A of the fourth tab from thescreen if the push-to-talk button is released after the interval lapsed.

[0084]200(x): Displaying a second variant 214B of the fourth tab. Thesecond variant 214B includes a yellow exclamation point and a secondtextual message. This tab is not removed unless the push-to-talk buttonis released.

[0085] Command Sequencing

[0086] Described below are various issues concerning command sequencing.These issues arise from the latency between a command and its execution.Spoken commands introduce longer latencies because speech requires morebandwidth to the headend, and it can be affected by network congestion.In addition, some applications are implemented by an agent. In thesecases, recognition is performed on the engine of the communicationssystem and the command is then sent on to the agent's applicationserver. Applications on the engine and those on the agent's servershould look the same to cable subscribers. In particular, it is highlydesirable for the recognition feedback for a spoken command and theresults of the execution to appear on the television screen at the sametime. However, if there is likely to be latency in communicating with anoff-engine application server or in the execution of the command, therecognition feedback should appear as soon as it is available.

[0087] When there is congestion and spoken commands are taking a longtime to process, the cable subscriber may try to use the buttons on theremote control or to issue another spoken command. The sequence diagramsbelow describe what happens when the cable subscriber attempts to issueanother command. There are race conditions in the underlying system. Theguidelines to handle these sequencing issues support two general goals:

[0088] First, the cable subscriber should be in control. If a command istaking too long, the cable subscriber should be able to issue anothercommand. In the sequence diagrams, when a cable subscriber presses aremote control button while a spoken command is being processed, thespoken command is preempted, where possible, to give control back to thecable subscriber. A detailed description of where preemption is possibleand which part of the system is responsible for the preemption accompanythe sequence diagrams.

[0089] Second, the system should be as consistent as possible. Toaccomplish this, it is necessary to minimize the race conditions in theunderlying system. This can be done in at least two ways:

[0090] (1) Prevent the cable subscriber from issuing a second voicecommand until the STB receives an indication of whether the recognitionfor the first command was successful or not. This makes it highlyprobable that the application has received the first command and isexecuting it by the time the subscriber sees the recognition feedback.If the command still takes a long time to execute, there are twoexplanations, either there is a network problem between the engine andthe application server executing the command, or the latency is in theapplication, not the speech recognition system. Network problems can behandled via the command sequencing described below. Applications wherethere can be long latencies should already have built-in mechanisms todeal with multiple requests being processed at the same time. Forexample, it can take a long time to retrieve a web page, and the webbrowser would be prepared to discard the first request when a secondrequest arrives.

[0091] (2) Require applications to sequence the execution of commands asfollows. If the cable subscriber issues commands in the order spokencommand (A), followed by button command (B), and the applicationreceives them in the order A, B, both commands are executed. If theapplication receives them in the order B, A, command B is executed, andwhen command A arrives, it is discarded because it is obsolete.

[0092]FIG. 3A through FIG. 3E are sequence diagrams showing the pointsin time where a second command may be issued and describing what shouldhappen when the second command is issued.

[0093]FIG. 3A shows the timeline of a normal spoken command. The rounddots 310 are events. A bar 320 that spans events indicates activity. Forexample, the bar between push-to-talk (PTT) button pressed and PTTbutton released indicates that the PTT button is depressed and speechpackets are being generated. The labels on the left side of the diagramindicate the components in the system. STB/VoiceLink refers to the inputsystem including the set-top-box 130, the remote control 110, and thereceiver 120 as illustrated in FIG. 1.

[0094] The application wrapper and the application server are listed asseparate components. When the entire application resides on the engine,the wrapper and the server are the same component, and commandsequencing is easier.

[0095] A dot on the same horizontal line as the name of the componentmeans that the event occurred in this component. The labels 330 on thebottom of the diagram describe the events that have occurred. The eventsare ordered by the time they occurred.

[0096] There are four cases where a button or spoken command can beissued while another command is already in progress. These are shownunder the label “Interrupt cases” 340 at the top right of the diagram.The rest of the diagrams (FIGS. 3B-3E) describe what happens in each ofthese cases.

[0097]FIG. 3B shows the time line when the spoken command is interruptedby a button input (case#1). In this case, the cable subscriber pushed aremote control button before the STB/Voice Link sent all of the packetsfor the spoken command to the Recognition System. The diagram shows thatthe spoken command is cancelled and the remote control button command isexecuted. The STB/Voice Link and the Recognition System should cooperateto cancel the spoken command.

[0098]FIG. 3C shows the time line when the spoken command is interruptedby a button input (case#2). In this case, the cable subscriber presses aremote control button after the last packet is received by therecognition system and before the n-best list is processed by theapplication wrapper. In both situations, the spoken command is discardedand the button command is executed. This diagram shows that theSTB/VoiceLink and the Recognition System could have cooperated to cancelthe spoken command in sub-case A, and the application would not have hadto be involved. In sub-case B, the application cancels the spokencommand because it arrived out of sequence.

[0099]FIG. 3D shows the time line when the spoken command is interruptedby a button input (case#3). In this case, the cable subscriber pressed aremote control button after the positive recognition acknowledgement wasreceived and before the spoken command was executed. It is theapplication's responsibility to determine which of the two commands toexecute. In sub-case A the spoken command is received out of sequence,and it is ignored. In sub-case B, the spoken command is received inorder, and both the spoken command and the remote control button commandare executed.

[0100]FIG. 3E shows the time line in a case where the spoken command isinterrupted by a speech input. The cable subscriber issues a secondspoken command after the positive recognition acknowledgement wasreceived and before the first spoken command was executed. It is theapplication's responsibility to determine which of the two commands toexecute. In sub-case A the spoken commands are received in order andboth commands are executed. In sub-case B, the spoken commands arereceived out of order, the second command is executed, and the firstcommand is ignored.

[0101] Help Overlay

[0102] The help overlay displays a short, context-sensitive list offrequently used spoken commands for each unique screen of everyspeech-enabled application. The help overlay is meant to accomplish twogoals: First, providing hints to new users to allow them to controlbasic functionality of a particular speech-enabled application; andsecond, providing a reminder of basic commands to experienced users incase they forget those commands. In addition to displayingapplication-specific commands, the help overlay always shows thecommands for accessing the main menu overlay and “more help” from theuser center. Also, the help overlay explains the speakable textindicator, if it is activated. Note that the help overlay helps thecable subscriber use and spoken commands. It does not describeapplication functionality.

[0103] The help overlays are organized as follows:

[0104] Application-specific commands (approximately five basic commands)

[0105] “More help” command (link to the user center)

[0106] “Main Menu” command to display main menu overlay

[0107] “Exit” to make overlay disappear

[0108]FIG. 4 is a flow diagram illustrating a process by which the helpoverlay appears and disappears. The process includes the followingsteps:

[0109]400(a): Displaying a first help overlay if the speech recognitionis successful. The first help overlay 410 is a dialog box which includes(1) a tab signaling a positive speech recognition—for example it may bea green check mark overlaid on a non-highlighted brand logo; (2) a textbox for textual help information, which may further include a “morehelp” link and speakable text; and (3) virtual buttons—one for main menuand the other one for exit to make the overlay disappear. The first helpoverlay might stay on the screen for a first interval, for example,twenty seconds.

[0110]400(b): Removing the first help overlay 410 from the screen if (1)the first interval lapses; (2) any button of the remote control deviceis accidentally pressed; or (3) the exit button is selected.

[0111]400(c): Displaying a second help overlay 420 while thepush-to-talk button is being pressed to give a new speech input.Structurally, the help overlay 420 is same as the help overlay 410. Theonly difference is that the immediate feedback tab in the help overlay420 signals push-to-talk activation rather than a positive recognitionas in the help overlay 410.

[0112] Feedback Overlays

[0113] There are two types of Feedback Overlays: Recognition FeedbackOverlays and Application Feedback Overlays. Recognition FeedbackOverlays inform the cable subscriber that there has been a problem withspeech recognition. Application Feedback Overlays inform the cablesubscriber about errors or problems related to the application's speechinterface. Recognition Feedback Overlays exist in three states andrespond to several different conditions. The three different RecognitionFeedback states correspond to a number of unsuccessful recognitions thatoccur sequentially. This behavior occurs when the cable subscriber triesmultiple times to issue a command which is not recognized by the system;the three states offer progressively more feedback to the cablesubscriber with each attempt. The response to each attempt would includelinks to escalating levels of help.

[0114] The three recognition feedback states are: (1) the firstunsuccessful recognition—the immediate speech feedback indicator changesto a question mark which provides minimal, quickly understand feedbackto the cable subscriber; (2) the second unsuccessful recognition—thefeedback overlay is displayed with a message and link to the helpoverlay; and (3) the third unsuccessful recognition—the feedback overlayis displayed with another message and links to the help overlay and morehelp in the user center.

[0115] The different recognition feedback conditions that correspond tothe amount of information that the recognizer has about the cablesubscriber's utterance and to the latency in the underlying systeminclude:

[0116] Low confidence score. A set of generic “I don't understand”messages is displayed.

[0117] Medium confidence score. A list of possible matches may bedisplayed.

[0118] Sound level of utterance too low. The “Speak more loudly or holdthe remote closer” message is displayed.

[0119] Sound level of utterance too high. The “Speak more softly or holdthe remote farther away” message is displayed.

[0120] Talking too long. In the preferred embodiment, there is a tensecond limit to the amount of time the push-to-talk button may bedepressed. If the time limit is exceeded, the utterance is discarded andthe “Talking too long” message is displayed.

[0121] Push-to-talk button stuck. If the push-to-talk button has beendepressed, for example, for twenty seconds, the “push-to-talk buttonstuck” message is displayed.

[0122] Processing too long. As described in 200(h) above, if the remotecontrol and the STB are unable to transfer an utterance to the head-endwithin, for example, five seconds after the push-to-talk button isreleased, the “Processing too long” message is displayed.

[0123] Application Feedback Overlays are displayed whenapplication-specific information needs to be communicated to the cablesubscriber. A different indicator at the top of the overlay (forexample, tab 214) differentiates Application Feedback from RecognitionFeedback. Application Feedback would include response or deficiencymessages pertaining to the application's speech interface.

[0124] Main Menu Overlays

[0125] In the preferred embodiment, the main menu overlay provides alist of speech-enabled digital cable services that are available to thecable subscriber. The main menu overlay is meant to be faster and lessintrusive than switching to a separate screen to get the samefunctionality. The service list may, for example, include: (1) “WatchTV” for full screen TV viewing; (2) “Program Guide”; (3) “Video onDemand”; (4) “Walled Garden/Internet”; and (5) “User Center.” Thecurrent service is highlighted. Additional commands displayed include“Exit” to make overlay disappear.

[0126]FIG. 5 is a flow diagram illustrating the process by which themenu overlay appears and disappears. The process includes the followingcomputer-implemented steps:

[0127]500(a): Displaying a first main menu overlay if the speechrecognition is successful. The first main menu overlay 510 is a dialogbox which includes (1) a tab signaling a positive speech recognition—forexample it may be a green check mark overlaid on a non-highlighted brandlogo; (2) a text box for textual information about the main menu, whichmay further includes speakable text; and (3) one or more virtual buttonssuch as the help button and the exit button. The main menu overlay stayson the screen for a first interval, perhaps 20 seconds for example.

[0128]500(b): Removing the first main menu overlay 510 from the screenif (1) the first interval lapses; (2) any button of the remote controlis accidentally pressed; or (3) the exit button is selected.

[0129]500(c): Displaying a second main menu overlay 520 while thepush-to-talk button is being pressed to give a new speech input fornavigation. Structurally, the second main menu overlay 520 is same asthe first main menu overlay 510. The only difference is that theimmediate feedback tab in the second main menu overlay 520 signalspush-to-talk activation rather than a positive recognition as in thefirst main menu overlay 510.

[0130] Speakable Text Indicator

[0131] The Speakable Text Indicator appears to be layered abovespeech-enabled applications as a part of the GSUI. This treatment mayapply to static or dynamic text. Static text is used in labels foron-screen graphics or buttons that may be selected by moving a highlightwith the directional keys on the remote control. As such, most screensusually have several text-labeled buttons and therefore require acorresponding number of speakable text indicators. Dynamic text is usedin content such as the list of movies for the Video on Demand (VOD)application. Each line of dynamic text may include speakable textindicators to indicate which words are speakable. The speakable textindicator is currently a green dot, and may be changed to a differentindicator. It is important that the indicator be visible but notdistracting. Additionally, the cable subscriber should have the abilityto turn the speakable text Indicators on and off.

[0132] Television Screen Interface—Graphic User Interface (GUI)

[0133] The GSUI overlays described above are created from a set oftoolkit elements. The toolkit elements include layout, brand indicator,feedback tab, dialog box, text box, typeface, background imagery,selection highlight, and speakable text indicator.

[0134] The multiple system operator (MSO) has some flexibility tospecify where the GSUI should appear. The GSUI is anchored by theimmediate speech feedback tab, which should appear along one of theedges of the screen. The anchor point and the size and shape of thedialog boxes may be different for each MSO.

[0135] The brand identity of the service provider or the system designermay appear alone or in conjunction with the MSO brand identity. Wheneverthe brand identity appears, it should be preferably consistent inlocation, size and color treatment. The static placement of the brandindicator is key in reinforcing that the GSUI feedback is coming fromthe designer's product. Various states of color and animation on thebrand indicator are used to indicate system functionality. Screenscontaining the brand indicator contain information relative to speechrecognition. The brand indicator has various states of transparency andcolor to provide visual clues to the state or outcome of a speechrequest. For example: a 40% transparency indicator logo is used as abrand indication, which appears on all aspects of the GSUI; a solidindicator logo is used to indicate that the remote's push-to-talk buttonis currently being pressed; and a 40% transparency flashing indicatorlogo is used to indicate that the system heard what the user said and isprocessing the information. A brand indicator may be placed anywhere onthe screen, but preferably be positioned in the upper left corner of thescreen and remain the same size throughout the GSUI.

[0136] The feedback tab is the on-screen graphical element used toimplement immediate speech feedback as described above. The feedback tabuses a variety of graphics to indicate the status and outcome of aspeech request. For example: a green check mark overlaid on the brandindicator might indicate “Positive Speech Recognition Feedback”; a redquestion mark overlaid on the brand indicator might indicate“Misrecognition Speech Feedback”; a 40% transparency flashing brandindicator logo might indicate “Speech Recognition Processing”; a solidbrand indicator logo might indicate “Push to Talk Button Activation”; ayellow exclamation point overlaid on the brand indicator logo mightindicate “Application Alert”; a prohibition sign overlaid on the brandindicator logo might indicate “Non-speech Enabled Alert”. The presentlypreferred tab design rules include: (1) any color used should beconsistent (for example, R: 54, G: 152, B: 217); (2) it should alwayshave a transparent background; (3) it should always be consistentlyaligned, for example, to the top of the TV screen; (4) the size shouldalways be consistent, for example, 72 w×67 h pixels; (5) the brandindicator should always be present; (6) the bottom corners should berounded; (7) the star and graphic indicators should be centered in thetab.

[0137] The dialog box implements the Feedback Overlay, Help Overlay,Main Menu Overlay, and Command List Overlay described above. The dialogbox is a bounded simple shape. It may contain a text box to conveyinformation associated with the service provider's product. It may alsocontain virtual buttons that can be selected either by voice or by thebuttons on the remote control. Different dialog boxes may use differentsets of virtual buttons. When two different dialog boxes use a virtualbutton, it should preferably appear in the same order relative to therest of the buttons and have the same label in each dialog box.

[0138] Illustrated in FIG. 6A is an exemplary help dialog box 600. FIG.6B is a screen capture showing the appearance of the help dialog boxillustrated in FIG. 6A. The dialog box 600 includes a background box 610used to display graphic and textual information, a text box 630 used todisplay textual information, a brand indicator logo 640, and virtualbuttons 650 and 655. The text box 630 is overlaid on the background box610. The presently preferred dialog box design rules include: (1) thedialog box should always flush align to the top of the TV screen; (2)the bottom corners should be rounded; (3) service provider's BackgroundImagery should always be present; (4) the box height can fluctuate, butwidth should stay consistent; and (5) the box should always appear onthe left side of the TV screen.

[0139] The text box 630 conveys information associated with theprovider's product. This information should stand out from thebackground imagery 620. To accomplish this, the text box 630 is abounded shape placed within the bounded shape of the background box 610.In a typical embodiment, the textual information in the text box 630 isalways presented on a solid colored blue box, which is then overlaid onthe background box 610. There can be more than one text box per dialogbox. For example, the main menu overlay contains one text box for eachitem in the main menu. Secondary navigation, such as the “menu” button655 and “exit” button 650, can be displayed outside the text box on thedialog box background imagery. The presently preferred text box 630design rules include (1) the color should always be R: 42, G: 95, B:170; (2) the text box should always sit eight pixels in from each sideof the Dialog box; (3) all corners should be rounded; and (4) all textwithin a text box should be flush left.

[0140] Use of a single font family with a combination of typefaces helpsreinforce the brand identity. When different typefaces are used, eachshould be used for a specific purpose. This helps the cable subscribergain familiarity with the user interface. Any typeface used should belegible on the TV screen.

[0141] The background imagery 620 is used to reinforce the brand logo.The consistent use of the logo background imagery helps brand andvisually indicate that the information being displayed is part of thespeech recognition product.

[0142] The selection highlight is a standard graphical element used tohighlight a selected item on-screen. In a typical embodiment, it is atwo pixel, yellow rule used to outline text or a text box indicatingthat it is the currently selected item.

[0143] The speakable text indicator is a preferably a consistentgraphical element. It should always keep the same treatment. It shouldbe placed next to any speakable text that appears on-screen. In apreferred embodiment, the speakable text indicator is a green dot. Thegreen dot should be consistent in size and color throughout the GSUI andin all speech-enabled applications. Perhaps the only exception to thisrule is that the green dot is larger in the help text about the greendot itself.

[0144] The feedback tab is the graphic element used for immediate speechfeedback. This element appears on top of any other GSUI overlay onscreen. For example, if the help overlay is on screen, and the cablesubscriber presses the push-to-talk button, the push-to-talk buttonactivation tab, i.e. the solid logo image, appears on top of the helpoverlay.

[0145] The help overlay contains helpful information about the speechuser interface and menu and exit buttons. The visual design of the helpoverlay is a dialog box that uses these graphical elements: brandindicator, text box, background imagery, typeface and menu highlight, aswell as a dialog box title indicating which service the Help is for. Thecontent in the text box changes relative to the digital cable servicebeing used. The help overlay should never change design layout but canincrease or decrease in length according to text box needs.

[0146] The feedback overlay is displayed upon misrecognition of voicecommands. The presently preferred visual design of the feedback overlayis a dialog box that uses the following graphical elements: brandindicator, text box, background imagery, typeface and menu highlight, aswell as a dialog box title indicating which service the feedback is for.The feedback overlay should never change design layout but can increaseor decrease in length according to text box needs.

[0147] The main menu overlay is a dialog box that contains a dialog boxtitle, buttons with links to various digital cable services and an exitbutton. The presently preferred main menu uses the following graphicalelements: dialog box, background imagery, typeface, menu highlight, andtext box. Each selection on the main menu is a text box.

[0148] Navigation

[0149] The GSUI incorporates various navigation functions. For example,the user navigates on-screen list based information via speech control.List based information may be manipulated and navigated various waysincluding commands such as: “go to letter (letter name)” and “pageup/down”. Items in lists of movies and programs may also be accessed inrandom fashion by simply speaking the item name. When viewing a movelist, the user may simply say a movie name within that list and belinked to the movie information screen.

[0150] For another example, the user may navigate directly betweenapplications via spoken commands or speech-enabled main menu. The usermay also navigate directly to previously “book marked” favorite pages.

[0151] For another example, the user may initiate the full screenprogram navigation function, which enables the user to perform thefollowing:

[0152] (1) Navigate, search, filter and select programs by spokencommand. This functionality is similar to many features found ininteractive program guides but is accessible without the visualinterface thus allowing less disruptive channel surfing experience.

[0153] (2) Initiate via speech control an automatic “scan” type searchfor programs within categories or genres. For example, user says “scansports” to initiate automatic cycle of sports programming. Each programwould remain on screen for a few seconds before advancing to nextprogram in the category. When the user finds something he wants towatch, he may say “stop”. Categories include but are not limited tosports, children, movies, news, comedy, sitcom, drama, favorites,reality, recommendations, classic etc. Feature is available as a meansto scan all programs without segmentation by category.

[0154] (3) Add television programs or channels to the categories such as“favorites”; edit television programs or channels in the categories; anddelete television programs or channels from the categories. The user mayalso set “parental control” using these “add”, “edit”, and “delete”functions.

[0155] (4) Search, using spoken commands, for particular programs basedon specific attributes. For example, “Find Sopranos”, “Find movie byCoppola”, etc.

[0156] (5) Filter, using spoken commands, groups of programs by specificattributes such as Genre, Director, Actor, Rating, New Release,Popularity, Recommendation, Favorites, etc. For example, “Find ActionMovies” or “Show me College Football”, etc.

[0157] Interactive Program Guide Control

[0158] One deployment of the GSUI is for the speech-enabled interactiveprogram guide (IPG), which is the application that the cable subscriberuses to find out what is on television. IPG supports variousfunctionalities. It enables the user to do the following via spokencommands:

[0159] (1) Access detailed television program information. For example,with program selected in guide or viewed full screen, the user issuescommand “Get Info” to link to the program information screen.

[0160] (2) Sort programs by category. For example, with IPG active, theuser issues command “Show Me Sports”. Additional categories includeFavorites, Movies, Music, News, etc.

[0161] (3) Access and set parental controls to restrict children'sability to view objectionable programming.

[0162] (4) Access and set reminders for programs to play in the future.For example, with IPG active, the user issues command “Go to Friday8PM”, and then with program selected, issues command “Set Reminder”.

[0163] (5) Search programs based on specific criteria. For example, withIPG active, the user issues command “Find Monday Night Football” or“Find Academy Awards”.

[0164] (

[0165]6) Complete pay-per-view purchase.

[0166] (7) Upgrade or access premium cable television services.

[0167] Video on Demand Service

[0168] Another deployment of the GSUI is for the Video on Demand (VOD),which functions as an electronic version of a video store. The GSUIprovides a streamlined interface where many common functions can beperformed more easily by spoken commands. The VOD application enablesthe user to do the following via spoken commands:

[0169] (1) Access detailed movie information.

[0170] (2) Sort by genre including but not limited to Action, Children,Comedy, Romance, Adventure, New Release, etc.

[0171] (3) Set parental control to restrict children's access tocontrolled video information.

[0172] (4) Search by movie title, actor, awards, and recommendations,etc.

[0173] (5) Get automatic recommendation based on voiceprintidentification.

[0174] (6) Navigate on Internet.

[0175] Other Functions

[0176] The GUSI may further incorporate functionalities to enable theuser to perform the following via spoken commands:

[0177] (1) Initiate instant messaging communication.

[0178] (2) Access and play games.

[0179] (3) Control all television settings including but not limited tovolume control, channel up/down, color, brightness, picture-in-pictureactivation and position.

[0180] (4) Control personal preferences and set up options.

[0181] (5) Link to detailed product information, such as productspecification, pricing, and shipping etc., based on televisionadvertisement or banner advertisement contained within applicationscreen.

[0182] (6) Receive advertisement or banners based on voiceprintidentification.

[0183] (7) Receive programming recommendations based on voiceprintidentification.

[0184] (8) Receive personalized information based on voiceprintidentification.

[0185] (9) Get automatic configuration of preferences based onvoiceprint identification.

[0186] (10) Complete all aspects of purchase transaction based onvoiceprint identification (also called “OneWord” transaction).

[0187] (11) Initiate a product purchase integrated with broadcastprogramming. For example, the user's “buy now” command while viewing QVCinitiates the purchase procedure.

[0188] (12) Control home services such as home security, homeentertainment system and stereo, and home devices such as CD, Radio,DVD, VCR and PVR via TV based speech control interface.

[0189] Speech Control—Commands and Guidelines

[0190] Each spoken command is processed in a context that includescommands to access any content named on the screen the cable subscriberis viewing, commands to access application features, commands to accessthe Global Speech User Interface (GSUI), commands to simulate remotecontrol button presses, and commands to navigate to other applications.Many of the guidelines described herein were developed to try tominimize the potential for words or phrases from one source to becomeconfused with those from another. For example, the content in theInteractive Program Guide (IPG) application contains the names oftelevision shows. There could easily be a television show named “Exit”which would conflict with using “exit” as the speech equivalent ofpressing the exit button on the remote control. The specification for acommand describes the way it fits into the environment.

[0191] The presently preferred specification includes the command's: (1)Scope, which characterizes when the command is available; (2) Language,which defines the words cable subscribers use to invoke the command; and(3) Behavior, which specifies what happens when the command is invoked.

[0192] Global commands are always available. Applications may onlydisable them to force the user to make a choice from a set ofapplication-specific choices. However, this should be a rare occurrence.Speech interfaces are preferably designed to make the cable subscriberfeel like he or she is in control. It is highly desirable for thenavigation commands to be speech-enabled and available globally. Thisallows cable subscribers to move from one application to another viavoice. When all of the applications supported by an MSO arespeech-enabled, both the navigation commands and the GSUI commandsbecome global. The GSUI commands are always available for speech-enabledapplications.

[0193] The navigation commands are preferably always available. Thenavigation commands include specific commands to allow cable subscribersto go to each application supported by the MSO and general commands thatsupport the navigation model. For example, “Video On Demand” is aspecific command that takes the cable subscriber to the VOD application,and “last” is a general command that takes the cable subscriber to theappropriate screen as defined by the navigation model. The language forthe navigation commands may be different for each MSO because each MSOsupports a different set of applications. The navigation modeldetermines the behavior of the navigation commands. There may be anoverall navigation model, and different navigation models for differentapplications. Where navigation models already exist, navigation is donevia remote control buttons. The spoken commands for navigation shouldpreferably be the same as pressing the corresponding remote controlbuttons. When a screen contains virtual buttons for navigation and thecable subscriber invokes the spoken command corresponding to the virtualbutton, the virtual button is highlighted and the command invoked.

[0194] The scope for remote control buttons varies widely. Some remotecontrol buttons are rarely used in any application, for example, the“a”, “b”, and “c” buttons. Some are used in most applications, forexample, the arrow keys. Because recognition can be improved by limitingchoices, it is preferred that each context only include spoken commandsfor applicable remote control buttons. The behavior of the spokencommands for remote control buttons keeps the same as pressing theremote control buttons. However, when a screen contains virtual buttonsthat represent buttons on the remote control and the cable subscriberinvokes the spoken command corresponding to a virtual button, thevirtual button is highlighted and the command invoked.

[0195] Cable subscribers should rarely be forced to say one of thechoices in a dialog box. The global commands are preferably alwaysavailable unless the cable subscriber is forced to say one of thechoices in a dialog box. This should be a rare event. People commonlysay phrases such as “Show me” or “Go to” before they issue a command.Application-specific commands should include these phrases to makeapplications more comfortable to use and more in keeping with continuousor natural language.

[0196] Although the invention is described herein with reference to thepreferred embodiment, one skilled in the art will readily appreciatethat other applications may be substituted for those set forth hereinwithout departing from the spirit and scope of the invention. Forexample, while the invention herein is described in connection withtelevision services, those skilled in the art will appreciate that theinvention also comprises any representational form of information withwhich a user interacts such as, for example, browser enabledtechnologies and would include the World Wide Web and informationnetwork access.

[0197] Accordingly, the invention should only be limited by the Claimsincluded below.

1. A computer readable storage medium encoded with instructions, whichwhen loaded into a communications system establishes a global speechuser interface (GSUI), said GSUI comprising: means for transcribingspoken commands into commands acceptable by said communications system;means for navigating among applications hosted on said communicationssystem; and means for displaying a set of visual cues to help a user togive proper command.
 2. The GSUI of claim 1, wherein said visual cuescomprise: a set of immediate speech feedback overlays, each of whichprovides simple, non-textual feedback information about a state of saidcommunications system; a set of help overlays, each of which provides acontext-sensitive list of frequently used speech-activated commands foreach screen of every speech-activated application; a set of feedbackoverlays, each of which provides information about a problem that saidcommunications system is experiencing; and a main menu overlay thatshows a list of services available to the user, each of said servicesbeing accessible by spoken command.
 3. The GSUI of claim 2, furthercomprising a user center that provides any of: training and tutorials onhow to use said communications system; more help with specificspeech-activated applications; user account management; and usersettings and preferences for said communications system.
 4. The GSUI ofclaim 3, wherein each of said immediate speech feedback overlaysprovides simple, non-textual feedback information about a state of saidcommunications system, said state being any of: listening to the user'sspoken command; non-speech enabled alert; speech recognition processing;application alert; positive speech recognition; and speech recognitionunsuccessful.
 5. The GSUI of claim 3, wherein each of said help overlaysis accessible at all times.
 6. The GSUI of claim 3, wherein said list ofspeech-activated commands provided by said help overlay comprises anyof: a set of application-specific commands; a command associated withthe user center for more help; a command associated to said main menudisplay; and a command to make said overlay disappear.
 7. The GSUI ofclaim 3, wherein said set of feedback overlays comprises any of: a setof recognition feedback overlays that informs the user of a situationrelated to recognition; and a set of application overlays that informsthe user of an error or a problem related to an application used in saidGSUI.
 8. The GSUI of claim 7, wherein said set of recognition feedbackoverlays, in responding to unsuccessful recognitions that immediatelyfollow one another, is displayed in three different modes comprising: afirst mode wherein said immediate speech feedback indicator changes to aquestion mark in responding to the first unsuccessful recognition; asecond mode wherein a textual message and a link to said help overlayare displayed in responding to the second unsuccessful recognition; anda third mode wherein a textual message, a link to said help overlay, anda link to said more help overlay are displayed in responding to thethird and subsequent unsuccessful recognition.
 9. The GSUI of claim 2,wherein said visual cues further comprises a treatment of on-screen textwhich can be activated by a spoken command.
 10. The GSUI of claim 9,wherein said treatment is an overlay in round shape and green color. 11.The GSUI of claim 9, wherein said treatment can be turned on or off bythe user.
 12. The GSUI of claim 9, wherein said on-screen text comprisesany of: a static text used in labels for on-screen graphics or invirtual buttons that may be selected by a cursor; and a dynamic textused in content wherein one or more words can be activated by a spokencommand.
 13. The GSUI of claim 2, wherein any of said help overlays,feedback overlays and main menu overlay is implemented in a dialog box,said dialog box comprising any of: one or more text box for textualinformation; and one or more virtual buttons.
 14. The GSUI of claim 13,wherein said dialog box further comprises an identity indicator.
 15. TheGSUI of claim 14, wherein said dialog box has an approximatelytransparent background.
 16. The GSUI of claim 14, wherein said dialogbox has an opaque background.
 17. The GSUI of claim 15, wherein saidapproximately transparent background is incorporated with a dynamicimage to enhance said identity indicator.
 18. The GSUI of claim 15,wherein said approximately transparent background is incorporated with astatic image to enhance said identity indicator.
 19. The GSUI of claim15, wherein said text box is overlaid on said approximately background.20. The GSUI of claim 2, wherein said main menu overlay comprises: afirst sub-menu overlay specifically for access to an interactive programguide system which provides cable television service; a second sub-menuoverlay specifically for access to a video on demand system whichprovides cable video service; and a third sub-menu overlay specificallyfor access to a walled garden system which provides browser-basedInternet service; wherein each of said sub-menus provides a set ofspeech-activated virtual buttons.
 21. The GSUI of claim 1, furthercomprising a speaker personalization and identification mechanism thatallows a user to train said communications system with approximatelyforty seconds of speech and identifies the user by voice.
 22. The GSUIof claim 21, wherein said speaker personalization and identificationmechanism can be activated and disabled by said particular user'scommand.
 23. The GSUI of claim 22, wherein said speaker personalizationand identification mechanism can be used to block any other user'saccess to any application run on said communications system.
 24. In aspeech-enabled communications system for facilitating a digitalinformation service, said communications system including television, aset top box, a speech input system, and a head-end, wherein a useractivates said speech input system by activating a switch associatedwith operation of a speech input device, a method for providing a set ofimmediate speech feedback overlays to inform a user of saidcommunications system's states, said method comprising the steps of: (a)checking if a current screen is speech-enabled when said switch isactivated; (b) if the current screen is speech-enabled, displaying afirst tab signaling that a speech input system is activated; (c) if thecurrent screen is not speech-enabled, displaying a second tab signalinga non speech-enabled alert, said second tab staying on screen for afirst interval; and (d) if said switch is re-activated, repeatingStep(a).
 25. The method of claim 24, wherein said first tab includes asolid image of an identity indicator.
 26. The method of claim 24,wherein said second tab comprises a prohibiting sign overlaid on saididentity indicator.
 27. The method of claim 26, wherein said second tabcan further comprises a text box for textual message.
 28. The method ofclaim 24, wherein said first interval in Step (c) is approximately tenseconds.
 29. The method of claim 24, wherein said Step (b) furthercomprises the steps of: (e) if said switch is not deactivated within asecond interval, interrupting recognition; (f) if said switch isdeactivated after a third interval lapsed but before said secondinterval in Step (e) lapsed, displaying a third tab signaling thatspeech recognition is in processing; and (g) if said switch wasdeactivated before said third interval in Step (f) lapsed, removing anytab on the screen.
 30. The method of claim 29, wherein said secondinterval in Step (e) is approximately ten seconds and said thirdinterval in Step (f) is approximately 0.1 second.
 31. The method ofclaim 29, wherein said third tab is a flashing identity indicator whichis approximately 40% transparent.
 32. The method of claim 29, whereinsaid Step (f) further comprises the steps of: (h) if said set top boxtakes longer than a fourth interval measured from the time that the userreleases said switch to the time that the last speech data is sent tosaid head-end, interrupting speech recognition processing and displayinga fourth tab signaling an application alert, said fourth tab staying onthe screen for a fifth interval; and (i) if a remote control buttonother than said switch is pressed while a spoken command is beingprocessed, interrupting speech recognition processing and removing anytab on the screen.
 33. The method of claim 32, wherein said fourthinterval is approximately five seconds and said fifth interval isapproximately ten seconds.
 34. The method of claim 32, wherein saidfourth tab comprises an exclamation point overlaid on said identityindicator.
 35. The method of claim 34, wherein said fourth tab canfurther comprises a text box for textual message.
 36. The method ofclaim 32, wherein said Step (h) further comprises the steps of: (j) ifsaid switch is re-activated while said fourth tab on the screen,removing the fourth tab and repeating Step (a); and (k) when said fifthinterval lapses or if a remote control button other than said switch isactivated while said fourth tab is on the screen, removing said fourthtab.
 37. The method of claim 29, wherein said Step (f), upon a completerecognition, further comprises the steps of: (l) checking whether thespeech recognition is successful; (m) if the speech recognition issuccessful, displaying a fifth tab signaling a positive speechrecognition, said fifth tab staying on the screen for approximately onesecond; and (n) if said switch is re-activated before said fifth tabdisappears, repeating Step (a).
 38. The method of claim 37, wherein saidfifth tab comprises a check mark overlaid on said identity indicator.39. The method of claim 29, wherein said Step (l) further comprises thesteps of: (o) if the speech recognition is unsuccessful, checking thenumber of unsuccessful recognitions which is automatically tracked bysaid communications system, said number being reset to zero after eachsuccessful recognition or when any button of said remote control deviceis pressed; (p) if the complete recognition is the first unsuccessfulrecognition, displaying a sixth tab signaling a misrecognition speech,said sixth tab staying on the screen for about one second; and (q) ifsaid switch is repressed before said sixth tab disappears, repeatingStep (a).
 40. The method of claim 39, wherein said sixth tab in Step (p)is a question mark overlaid on said identity indicator.
 41. The methodof claim 39, wherein said Step (o) further comprises the steps of: (r)if the complete recognition is the second unsuccessful recognition,displaying a first variant of said sixth tab signaling a misrecognitionspeech and displaying a short textual message, said first variant ofsaid sixth tab staying on the screen for about ten seconds; and (s) ifsaid switch is repressed before said first variant of said sixth tabdisappears, repeating Step (a).
 42. The method of claim 41, wherein saidfirst variant of said sixth tab comprises: a question mark overlaid onsaid identity indicator; and a short text box displaying a short textualmessage.
 43. The method of claim 39, wherein said Step (o) furthercomprises the steps of: (t) if the complete recognition is the thirdunsuccessful recognition, displaying a second variant of said sixth tabsignaling a misrecognition speech and displaying a long textual message,said second variant of said sixth tab staying on the screen for aboutten seconds; and (u) if said switch is re-activated before said secondvariant of said sixth tab disappears, repeating Step (a).
 44. The methodof claim 29, wherein said Step (e) further comprises the steps of: (v)displaying a first variant of said fourth tab, said first variantstaying on the screen for a sixth interval; (w) removing said firstvariant of said fourth tab from the screen if said switch is deactivatedafter said sixth interval lapsed; and (x) displaying a second variant ofsaid fourth tab, said second variant staying on the screen until saidswitch is deactivated.
 45. The method of claim 44, wherein said firstvariant comprises an exclamation point and a first textual message. 46.The method of claim 44, wherein said sixth interval is approximately tenseconds.
 47. The method of claim 44, wherein said second variantcomprises an exclamation point and a second textual message.
 48. In aspeech-enabled communications system for facilitating a digitalinformation service, said communications system including television, aset top box, a speech input system, and a head-end, wherein a useractivates said speech input system by activating a switch associatedwith operation of a speech input device, a method for providing helpinformation by displaying a set of overlays on the user's screen, saidmethod comprising the computer-implemented steps of: (a) displaying afirst help overlay if a help command is successfully recognized, saidfirst help overlay staying on the screen for a specific interval; (b)removing said first help overlay from the screen if any of the followingoccurs: said specific interval lapses; any button of said speech inputdevice is accidentally activated; and an exit button incorporated insaid first help overlay is selected; and (c) displaying a second helpoverlay while said switch is activated for inputting a new spokencommand.
 49. The method of claim 48, wherein said first help overlay isa dialog box which includes a first tab signaling a positive speechrecognition, a text box for textual help information, and one or morevirtual buttons.
 50. The method of claim 49, wherein said first tab is acheck mark overlaid on a non-highlighted identity indicator.
 51. Themethod of claim 49, wherein said text box further includes a “more help”link.
 52. The method of claim 49, wherein said text box includes one ormore speech-activated words indicated by a speakable text indicator. 53.The method of claim 48, wherein said second help overlay is a dialog boxwhich includes a second tab signaling said switch's activation, a textbox for textual help information, and one or more virtual buttons. 54.In a speech-enabled communications system for facilitating a digitalinformation service, said communications system including television, aset top box, a speech input system, and a head-end, wherein a useractivates said speech input system by activating a switch associatedwith operation of a speech input device, a method for providing a mainmenu by displaying a set of overlays on the user's screen, said methodcomprising the computer-implemented steps of: (a) displaying a firstmain menu overlay if the speech recognition is successful, said firstmain menu overlay staying on the screen for a specific interval; (b)removing said first main menu overlay from the screen if any of thefollowing occurs: said specific interval lapses; any button of saidspeech input device other than said switch is accidentally activated;and an exit virtual button incorporated in said first main menu overlayis selected; and (h) displaying a second main menu overlay while saidswitch is activated for inputting a new spoken command.
 55. The methodof claim 54, wherein said first main menu overlay is a dialog box whichincludes a first tab signaling a positive speech recognition, a text boxfor textual menu information, and one or more virtual buttons.
 56. Themethod of claim 54, wherein said first tab is a check mark overlaid on anon-highlighted identity indicator.
 57. The method of claim 54, whereinsaid text box includes one or more speech-activated words indicated by aspeakable text indicator.
 58. The method of claim 54, wherein saidsecond main menu overlay is a dialog box which includes a second tabsignaling said switch's activation, a text box for textual menuinformation, and one or more virtual buttons.
 59. A speech-enabledinteractive television interfacing system, comprising: aninterconnection device which connects a television set with a televisionservice provider; a speech-enabled remote control device whichtransforms a user's spoken commands into signals acceptable by saidinterconnection device; and means for displaying a set of visual cues ona television screen to help the user give an operable commands.
 60. Thesystem of claim 59, wherein said interconnection device comprises avolume indicator, and wherein said speech-enabled remote control devicecomprises a push-to-talk button, said button being in the same color assaid volume indicator and any on-screen graphic indicatingspeech-enabled user interface elements.
 61. The system of claim 59,wherein said means for displaying provides immediate real-time visualfeedback indicating various states of speech recognition activities. 62.The system of claim 61, said real-time visual feedback comprises a setof overlays, each of which provides simple, non-textual feedbackinformation about a state of speech recognition activities, said statebeing any of: receiving spoken utterance; processing utterance;successful recognition; unsuccessful recognition; and command notallowed.
 63. The system of claim 59, wherein said visual cues providesescalating help feedback when the user's spoken command is notrecognized with a predefined degree of confidence.
 64. The system ofclaim 63, wherein said escalating help feedback comprises a set offeedback overlays to reveal progressive help information.
 65. The systemof claim 64, wherein each of said feedback overlays provides acontext-sensitive list of frequently used speech-enabled commands foreach screen.
 66. The system of claim 64, wherein each of said feedbackoverlays is accessible at all times.
 67. The system of claim 65, whereinsaid list of frequently used speech-enabled commands comprises any of: aset of application-specific commands; a command associated with a usercenter for more help information; a command associated with a main menudisplay; and a command to make said overlay disappear from the screen.69. The system of claim 59, wherein said means for displaying allows theuser to initiate, via spoken command, an overlay display which indicatesselectable user interface elements.
 70. The system of claim 69, whereinsaid selectable user interface elements comprise any of: numericidentifications; navigation options; and application control options.71. The system of claim 59, wherein when the user's spoken command isnot recognized with a predefined degree of confidence, said means fordisplaying presents a list of predicted commands prompting the user toselect from said list.
 72. The system of claim 59, further comprises:means for navigating on-screen list based information via spokencommands.
 73. The system of claim 72, wherein said means for navigatingenables the user to direct said on-screen list based information scrollup or scroll down by speaking a corresponding command.
 74. The system ofclaim 72, wherein said means for navigating enables the user to selectan item from said on-screen list based information by speaking a letteror a number identifying said item.
 75. The system of claim 72, whereinsaid means for navigating enables the user to select an item from saidon-screen list based information by speaking the name of said item. 76.The system of claim 59, further comprises: means for allowing the userto navigate directly between applications via spoken command or a speechenabled menu.
 77. The system of claim 59, further comprises: means forallowing the user to navigate directly to previously book-marked pagesvia spoken command.
 78. The system of claim 77, wherein said directnavigation to previously book-marked pages operates within and betweenapplications.
 79. A speech-enabled interactive television interfacingsystem, comprising: an interconnection device which connects atelevision set with a television service provider; a speech-enabledremote control device which transforms a user's spoken commands intosignals acceptable by said interconnection device; and means forallowing the user to navigate television programs by spoken command. 80.The system of claim 79, further comprising: means for allowing the userto initiate via spoken command an automatic scan search for televisionprograms pursuant to a search category, wherein each matching programremains on screen for a short period of time before advancing to nextmatching program.
 81. The system of claim 79, further comprising: meansfor allowing the user to search, via spoken command, for particulartelevision programs by specific attributes.
 82. The system of claim 79,further comprising: means for allowing the user to perform any of:adding television programs to categories; editing television programs incategories; and deleting television programs from categories.
 83. Thesystem of claim 82, further comprising: means for allowing the user toset parental control, with which children are blocked from accessingcontrolled television channels or television programs.
 84. The system ofclaim 79, further comprising: means for allowing the user to filtergroups of television programs by specific attributes.
 85. Aspeech-enabled interactive television interfacing system, comprising: aninterconnection device which connects a television set with a televisionservice provider; a speech-enabled remote control device whichtransforms a user's spoken commands into signals acceptable by saidinterconnection device; and an interactive program guide that the usercan access via spoken command.
 86. The system of claim 85, wherein saidinteractive program guide comprises: means for allowing the user to, viaspoken command, sort television programs by category.
 87. The system ofclaim 86, wherein said interactive program guide comprises: means forallowing the user to set parental controls, with which children areblocked from accessing controlled television channels or televisionprograms.
 88. The system of claim 85, wherein said interactive programguide comprises: means for allowing the user to, via spoken command, setreminders for television programs to play in the future.
 89. The systemof claim 85, wherein said interactive program guide comprises: means forallowing the user to, via spoken command, search television programsbased on a specific criteria.
 90. The system of claim 85, wherein saidinteractive program guide comprises: means for processing pay per viewpurchases.
 91. The system of claim 85, wherein said interactive programguide comprises: means for allowing the user to, via spoken command,access and upgrade premium television services.
 92. A speech-enabledinteractive television interfacing system, comprising: aninterconnection device which connects a television set with a televisionservice provider; a speech-enabled remote control device whichtransforms a user's spoken commands into signals acceptable by saidinterconnection device; and an interactive video on demand service, fromwhich the user can order any video program contained in a list.
 93. Thesystem of claim 92, wherein said video on demand service comprises:means for allowing the user to, via spoken command, sort video programsby categories.
 94. The system of claim 92, wherein said video on demandservice comprises: means for allowing the user to, via spoken command,search video programs by properties.
 95. The system of claim 92, whereinsaid video on demand service comprises: means for allowing the user to,via spoken command, set parental control with which children are blockedfrom accessing controlled video programs.
 96. The system of claim 92,wherein said video on demand service comprises: means for allowing theuser to obtain automatic recommendation based on voiceprintidentification.
 97. A speech-enabled interactive television interfacingsystem, comprising: an interconnection device which connects atelevision set with a television service provider; a speech-enabledremote control device which transforms a user's spoken commands intosignals acceptable by said interconnection device; and a speech enabledinterface that allows the user to, via spoken command, conduct instantmessaging communication.
 98. A speech-enabled interactive televisioninterfacing system, comprising: an interconnection device which connectsa television set with a television service provider; a speech-enabledremote control device which transforms a user's spoken commands intosignals acceptable by said interconnection device; and a speech enabledinterface that allows the user to, via spoken command, activate links totelevision advertisement or banner advertisement contained in anapplication screen.
 99. A speech-enabled interactive televisioninterfacing system, comprising: an interconnection device which connectsa television set with a television service provider; a speech-enabledremote control device which transforms a user's spoken commands intosignals acceptable by said interconnection device; and means fortargeting television advertisement or banner advertisement contained inan application screen to the user based on voiceprint identification.100. A speech-enabled interactive television interfacing system,comprising: an interconnection device which connects a television setwith a television service provider; a speech-enabled remote controldevice which transforms a user's spoken commands into signals acceptableby said interconnection device; and means for targeting televisionprogramming recommendations to the user based on voice identification.101. A speech-enabled interactive television interfacing system,comprising: an interconnection device which connects a television setwith a television service provider; a speech-enabled remote controldevice which transforms a user's spoken commands into signals acceptableby said interconnection device; and means for delivering personalizedinformation to the user based on voice identification.
 102. Aspeech-enabled interactive television interfacing system, comprising: aninterconnection device which connects a television set with a televisionservice provider; a speech-enabled remote control device whichtransforms the user's spoken commands into signals acceptable by saidinterconnection device; and means for automatically configuring theuser's interface preferences based on voiceprint identification.
 103. Aspeech-enabled interactive television interfacing system, comprising: aninterconnection device which connects a television set with a televisionservice provider; a speech-enabled remote control device whichtransforms the user's spoken commands into signals acceptable by saidinterconnection device; and means for allowing the user to complete allaspects of a transaction via spoken commands.
 104. A speech-enabledinteractive television interfacing system, comprising: aninterconnection device which connects a television set with a televisionservice provider; a speech-enabled remote control device whichtransforms the user's spoken commands into signals acceptable by saidinterconnection device; and means for allowing the user to exercisecentral control, via spoken commands, over home services and devices.